ELUCNN for explainable COVID-19 diagnosis

COVID-19 is a positive-sense single-stranded RNA virus caused by a strain of coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Several noteworthy variants of SARS-CoV-2 were declared by WHO as Alpha, Beta, Gamma, Delta, and Omicron. Till 13/Dec/2022, it has caused 6.65 million death tolls, and over 649 million confirmed positive cases. Based on the convolutional neural network (CNN), this study first proposes a ten-layer CNN as the backbone model. Then, the exponential linear unit (ELU) is introduced to replace ReLU, and the traditional convolutional block is now transformed into conv-ELU. Finally, an ELU-based CNN (ELUCNN) model is proposed for COVID-19 diagnosis. Besides, the MDA strategy is used to enhance the size of the training set. We develop a mobile app integrating ELUCNN, and this web app is run on a client–server modeled structure. Ten runs of the tenfold cross-validation experiment show our model yields a sensitivity of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.41\pm 0.98$$\end{document}94.41±0.98, a specificity of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.84\pm 1.21$$\end{document}94.84±1.21, an accuracy of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.62\pm 0.96$$\end{document}94.62±0.96, and an F1 score of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.61\pm 0.95$$\end{document}94.61±0.95. The ELUCNN model and mobile app are effective in COVID-19 diagnosis and give better results than 14 state-of-the-art COVID-19 diagnosis models concerning accuracy.


Introduction
A strain of coronavirus causes coronavirus disease 2019 , severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Brown et al. 2022), which is a single-stranded positive-sense ribonucleic acid (RNA) virus that is contagious in human (Samandar et al. 2022).
There are thousands of variants of SARS-CoV-2, which can further be clustered into much larger clades. Five noteworthy variants of SARS-CoV-2 were declared by WHO as Alpha, Beta, Gamma, Delta, and Omicron (Vass et al. 2022). SARS-CoV-2 is a virus associated with the SARS-CoV-1 virus that triggered the 2002-2004 SARS outburst (Lopez 1622). COVID-19 was declared a pandemic in March/2020. Till 13/Dec/2022, it has caused 6.65 million death tolls, and over 649 million confirmed positive cases. Figure 1 shows the COVID-19-related information per country till 8/Dec/2022. There are three popular diagnosis approaches. The first is viral tests (Mak et al. 2022), generally via a nasopharyngeal swab (Savela et al. 2022), which involves analyzing samples to gauge the existing presence of SARS-CoV-2. The viral test includes nucleic acid amplification test (NAAT) (Stadelman et al. 2022) and antigen test (Urrutikoetxea-Gutierrez et al. 2023). Other viral tests use nontraditional respiratory specimens.
The second is the antibody test (AT) (Jonczyk et al. 2022), which gauges the earlier presence of SARS-CoV-2, i.e., the previous infection. AT does not identify present infections (McCarthy et al. 1760). AT is utilized for public health inspection and epidemiologic goals.
The final is the imaging approaches, among which chest computed tomography (CCT) (Ngoh et al. 2022) gives better diagnostic performances than chest X-ray and chest ultrasound (Bahrami-Motlagh et al. 2022).
Nevertheless, manual interpretations by radiologists/consultants/physicians are tedious and easily influenced by inter-expert and intra-expert factors. Recently, several artificial intelligence (AI)-or deep learning (DL)based models have been proposed. Yang (2018) presented a kernel-based extreme learning machine (K-ELM) classifier to detect pathological brains. Their method is effective and robust and can be used in COVID-19 diagnosis. Zhang (2022a) proposed a convolutional neural network (CNN) with stochastic pooling (SP). Their method is named CNN-SP. Li et al. (2020) proposed the COVID-19 detection neural network (COVNet) to detect both community-acquired pneumonia and COVID-19. Ni et al. (2020) presented a deep learning approach (DLA) to characterize COVID-19. Wang et al. (2020) proposed the weakly-supervised framework (WSF) for COVID-19 classification. Zhang (2022b) proposed three deep COVID network (DC-Net) models. Their method reached an average accuracy of 90.91%. Wu (2020) introduced wavelet Renyi entropy (WRE) to classify COVID-19. They presented a three-segment biogeography-based optimization to train their network. El-kenawy, et al. (2020) presented the feature selection and voting classifier (FSVC) model for detecting COVID-19. In Chen (2020) 's paper, the authors mixed a gray-level co-occurrence matrix (GLCM) with a support vector machine (SVM). Hou (2022) proposed a 6-layer deep convolutional neural network. The name of their method is shortened to 6 l-DCNN. Khan (2021) introduced the pseudo-Zernike moment (PZM) to assist in classifying COVID-19. Wang (2021) used the Jaya algorithm to classify COVID-19. Pi (2021) used Schmitt neural network (SCNN) to classify COVID-19. Gafoor et al. (2022) developed a deep learning model (DLM) for detecting COVID-19 using Chest X-ray images.
After analysis of the models mentioned above, we find that most deep learning models suffer from overfitting due to the small-sample-number dataset. As is noted, using machine learning (ML) on small-sample-number datasets presents a problem because the power of ML in recognizing patterns is proportional to the number of samples of the dataset. The smaller number of the dataset, the less powerful and less accurate the ML algorithms (Kokol et al. 2022).
To solve this issue, we introduce the exponential linear unit (ELU) that provides both faster learning and better generalization performance than traditional ReLU (Clevert et al. 2016). Our model is named ELU-based convolutional neural network (ELUCNN) for short and is compared with state-of-the-art (SOTA) models.
COVID-19-related mobile apps were developed in the past, such as COVIUAM (Montero-Contreras et al. 2021). Tsinaraki et al. (2021) investigated Google Play, Apple's App Store, relevant tweets, and digital media outlets. They listed recent mobile apps to fight the COVID-19 crisis. Therefore, inspired by previous COVID-19-related mobile apps, we have developed a mobile app based on our model ELUCNN. In all, this study has five contributions: (i) We proposed a 10-layer CNN from scratch to diagnose COVID-19. (ii) CELU is proposed by utilizing ELU to replace the traditional ReLU function. (iii) Multiway-data augmentation is used to enhance the training set. (iv) ELUCNN is proposed, whose performances give better performances than SOTA models. (v) We develop the mobile app for our model ELUCNN.

Dataset
The COVID-19 dataset is extracted from Ref. Zhang (2022a), which used CCT to take scans from subjects of local hospitals. This dataset composes 320 COVID-19positive images and the same number of healthy control (HC) images. Each image is the size of 256 Â 256 Â 1. Suppose the image set is symbolized as V ¼ v k ; k ¼ 1; 2; . . .; 640 f g , the labelling is carried out by three experts E 1 ; E 2 ; E 3 f g , in which E 1 ; E 2 ð Þ are junior experts while E 3 is a senior expert.
For each CCT image v k , each expert will make his/her own decision D v k ; E n ð Þ; n ¼ 1; 2; 3. The labeling of v k is defined as O v k ð Þ: where it means the output O v k ð Þ can be determined if the opinions of the two junior experts are identical. Otherwise, the output O v k ð Þ is determined by the senior expert E 3 . Figure 2 presents one sample of each category.

Proposed ELUCNN
3.1 10-layer CNN backbone model Table 1 shows the abbreviations and the cognate meanings. CNN is the hottest artificial neural network, particularly suitable for image processing. Generally, CNN consists of the conv layers (CLs), the pooling layers (PLs), the nonlinear activation function (NLAF) layers, and the fully connected layers (FCLs). In addition, there are some auxiliary layers, such as normalization layers (Roburin et al. 2022), dropout layers (Garbin et al. 2020), crop layers, etc.
The basic layer is the CL. A complete CL (CCL) enacts the 2D convolution operation along the width and height courses. Figure 3 illustrates the schematic of how an input feature map passes through a CCL. There are three actions during a CCL: (a) Kernel-based convolution, (b) Stack, and (c) NLAF.
Suppose we have an input feature map A, L different kernels Q l ð Þ; 8l 2 ½1; . . .; L f g , and an output feature map D (the output D signifies the output of the whole three-action CCL, not the output of simply 2D kernel-based convolution operation). Say a CL stands for the layer running convolution, and the CCL stands for the merge of the kernelbased convolution, the stack, and the NLAF layer altogether.
For each kernel Q l , the convolution output is where signifies the convolution operation. Afterward, all f l ð Þ matrixes are stacked into a 3D matrix F.
where h s stands for the stack action. Lastly, the matrix F is delivered to the NLAF layer to produce the finishing matrix D as: where h NLAF is the NLAF function, which we will discuss in Sect. 3.2. Assume the sizes of three main constituents (input, kernel, and output) are: where h size is the size function, the triple elements W; H; C ð Þsignify the size of height, width, and channels of the feature ma, respectively. The subscripts A, Q l , and D signify the input, l-th kernel, and the output, respectively. L stands for the whole number of filters. Notice that C A ¼ C Q , indicating the number of channels of the input feature map C A should be equivalent to the number of channels of the kernel C Q .
Suppose the kernel filters Q l ; l ¼ 1; 2; . . .; L f g translate with a padding of b p and a stride of b s , it is easy to deduce the sizes W D Â H D Â C D ð Þ of output matrix D as: in which b:c signifies the floor function, which is useful if the quotient of either The number of channels of the output C D equals the number of filters L.
The backbone 10-layer CNN model is developed from scratch, which contains 8 CLs and 2 FCLs. Table 2 shows its structure, where # means the number, and KS means kernel size. We compare this 10-layer CNN model with other structures (such as 8-layer, 9-layer, and 11-layer) and find the 10-layer structure attains the best performance.

Proposed CELU and ELUCNN
For the functional form h of different NLAFs, traditional NLAFs choose the sigmoid h sig and hyperbolic tangent with its derivative as The output of Sigmoid (Venugopal et al. 2022) is in the range of 0; 1 ½ . In certain situations, the range À1; 1 ½ is anticipated. h sig ðsÞ may be shifted to turn out to be the hyperbolic tangent (HT) function as with its derivative as However, the widespread saturation ranges of h sig and hyperbolic tangent (Chandra 2022) function h HT cause gradient-based learning (GL) and its variants run feebly during CNN trainings (See Fig. 4a, b). Therefore, rectified linear unit (ReLU) h ReLU has gained the reputation, since it accelerates the convergence of GL against h sig and h HT .
Traditional ReLU function h ReLU is stated as with its derivative as If s\0, the values of h ReLU are zero. Hence, ReLU is hard to learn via GLs, since the corresponding gradients are entirely zero. The leaky ReLU (LReLU) (Nayef et al. 2022) and exponential linear unit (ELU) (Clevert et al. 2016) may alleviate the snag by altering the hard-zero range of ReLU. Mathematically, LRELU h LReLU ðsÞ is stated as where the parameter b ¼ 0:01 is the frequently pre-assigned quantity. The derivative of h LReLU s ð Þ is stated as ELU is defined as: where the default value of c is 1 (Lin et al. 2019). We tested other values of c, and found c ¼ 1 achieved the best performance on the test set. ELU has already shown its superiority to other NLAFs in ductal carcinoma in situ (Zhang 2021), optical gating trace retrieval (Xu et al. 2021), etc. Figure 4c-e shows the curves of the other three NLAFs.
We use ELU to replace the NLAF in Table 2 and obtain the CELU block, as shown in Fig. 5b. Here CELU stands for Conv-ELU. The traditional convolutional block (CB) with ReLU is shown in Fig. 5a. The difference between CELU and CB is apparent by observing Fig. 5a, b, i.e., we replace the ReLU in traditional CB with ELU in our proposed CELU.
The structure of our proposed ELUCNN is shown in Fig. 5c. Note that we have N CELU ¼ 8 CELUs and N FCL ¼ 2 FCLs, and hence this deep neural network contains ten learnable layers.

Explainability of the proposed ELUCNN model
Here Z 14 ð Þ is used as the feature layer for generating explainable heatmaps (Papandrianos et al. 2022) by the gradient-weighted class activation mapping (Grad-CAM) method. The feature layer is expected to extract the AM when computing the Grad-CAM (Dworak and Baranowski 2022). The feature layer is usually the final layer with nonsingleton spatial dimensions, so here we can only choose Z 14 ð Þ.
Note that Grad-CAM is one of the post-hoc explainability (Mochaourab et al. 2022) methods. The post-hoc explainability method approximates the logic of our proposed ELUCNN model, intending to explain its internal workings so that human radiologists can understand its internal mechanism.
3.4 Multiple-way data augmentation G-fold cross-validation (CV) is used. The whole dataset is split into almost equal G folds. Afterward, at g ¼ 1; 2; . . .; G trial, the g-th fold is employed as the test set, and the remaining G À 1 folds 1; 2; . . .; g À 1; g þ 1; f . . .; Gg for training. This study chooses G ¼ 10. Figure 6 shows an illustration of G-fold CV. Particularly, the G-fold CV will repeat Z runs.
The training set is relatively small for deep neural network training. In the training set, we choose to use multiple-way data augmentation (MDA) (Zhou 2021), which is proven to have better performance than the traditional data augmentation method. Figure 7 shows the schematic of MDA. Note here, we add noise to the training images to make the training more robust (Andrade and Baan 2021).
We use speckle noise, Salt-and-Pepper noise, and Gaussian noise.
First, j 1 different data augmentation (DA) methods are harnessed to raw training image r x ð Þ. Let H j ; j ¼ 1; . . .; j 1 represents each DA operation, we get the augmented images of r as Suppose j 2 stands for the number of generated new images for each DA method. Afterward, where || denotes the number of elements in that set. Second, the HMI r h is provided as: where h HMI signifies horizontal mirror function. Third, all the j 1 different DA methods are carried out on the HMI r h , and generate j 1 different datasets.
Fourth, the raw image r, the HMI r h , j 1 -way datasets of raw image H j r ½ , and j 1 -way datasets of HMI H j r h Â Ã , are concatenated. The ending dataset from r is symbolized as R: where h con stands for the concatenation function. Assume the augmentation factor is j 3 , representing the number of images in R, and we deduce This algorithm set j 1 ¼ 9, j 2 ¼ 30; thus, j 3 ¼ 542.

Measures
Remember, we carry out G-fold cross-validation Z runs. Suppose the confusion matrix M over z-th run is The F1 score of z-th run i 5 z ð Þ is defined as Matthews correlation coefficient (MCC) is often used to measure binary classification. The MCC of z-th run i 6 ðzÞ is stated as In statistics, MCC is also known as the mean square contingency coefficient (Gietzen et al. 2022).
Fowlkes-Mallows index (FMI) (Davagdorj et al. 2022) of z-th run i 7 ðzÞ is defined as: After running up all the Z runs, we deduce the mean and standard deviation (MSD, symbolized as a AE b) of all seven measures as Moreover, the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are reported based on ten runs. Table 3 discloses the setting of hyperparameters. The optimal values are obtained using trial and error. The NLAF function is chosen as h ELU . The parameter c in ELU is set to 1. The number of CELU blocks is set to N CELU ¼ 8. The number of FCL is set to N FCL ¼ 2. We run the tenfold CV 10 times. We introduce totally j 1 ¼ 9 different DA methods. The number of generated new images for each DA method is j 2 ¼ 30. The augmentation factor is j 3 ¼ 542. Figure 8 presents the results of MDA supposing the raw image in Fig. 2a. Owing to the page limit, we do not present the HMI and its corresponding MDA results. From Fig. 8, we can see that MDA is able to enhance the diversity of images in the training set, and thus it can help the model escape overfitting.

Result of proposed ELUCNN and effectiveness of ELU
The ten runs of the tenfold CV of the proposed ELUCNN's results are shown in the first sub- We will validate the effectiveness of ELU. We compare our ELUCNN model with the same backbone models with ReLU and LReLU, respectively. The corresponding models are named Model 1 and Model 2, i.e., h NLAF ¼ h ReLU in Model 1 and h NLAF ¼ h LReLU in Model 2. See the second row in Table 3.
The results of Model 1 and Model 2 are shown in the second and third sub-tables of Table 4. We observe that the ELUCNN model has a 0.81% accuracy increase compared to ReLU and a 0.67% accuracy increase compared to LReLU. Figure 9 draws the ROC curve comparison between the three models. We see that the proposed ELUCNN yields the area under curve (AUC) value of 0.9739, larger than Model 1 (with an AUC value of 0.9691) and Model 2 (with an AUC value of 0.9697). The results demonstrated the effectiveness of ELU.

Convergence of the proposed ELUCNN model
One typical run of the convergence plot of our ELUCNN model is shown in Fig. 10. The maximum iteration is 8537.
There is a sharp increase in the first 1500 iterations. Then the accuracies of both the training set and test set slowly rise from the 1500th iteration to the 6000th iteration. After the 6000th iteration, the test accuracy curve remains stable. The final test accuracy is 94.56%.

Comparison to SOTA models
This study compares the proposed ELUCNN model with SOTA COVID-19 diagnosis models on this entire 640-image dataset using ten runs of tenfold CV.   (Yang 2018) is originally developed for brain detection. We modify their method and adapt it to our task. Table 5 lists the comparison results.
Here, the second column shows whether the model is deep learning (DL) model or a non-deep learning (NDL) model. Figure 11 shows the model comparison. Since i 6 values are the fewest among all the seven measures, we move it to the rightmost. This 3D bar plot shows that the CNN-SP (Zhang 2022a) obtains the best sensitivity (i 1 ) value of 94.44%. The COVNet (Li et al. 2020) obtains the best specificity (i 2 ) value of 95.72% and precision (i 3 ) value of 95.52%. Nevertheless, CNN-SP (Zhang 2022a) obtains a relatively low specificity value, and COVNet (Li et al. 2020) obtains a relatively low sensitivity value. Another drawback of CNN-SP (Zhang 2022a) is that its learnable layers are slightly shallow, i.e., only seven learnable layers. Its performance may be improved by adding more learnable layers and a more reliable tuning mechanism. For the model of COVNet (Li et al. 2020), its sensitivity is 4.72% lower than its specificity, which indicates the sensitivity and specificity of COVNet (Li et al. 2020) are imbalanced.
The low sensitivity of COVNet (Li et al. 2020) is not expected by hospitals since detecting COVID-19 is much more important than detecting healthy subjects.
PZM (Khan 2021) manually extracts features other than learns features; hence, those manually extracted features may not be optimal for the COVID-19 diagnosis task. Also, in PZM (Khan 2021)'s method, the number of layers is only 4. In contrast, the backbone of our ELUCNN model possesses a 10-layer deep neural network. DC-Net (Zhang 2022b) uses three randomized neural networks (RNNs). The authors find random vector functional link (RVFL) can get the best result. Their method can be improved by using deep RNNs.
In all, the proposed ELUCNN obtains the greatest results in terms of accuracy (i 4 ), F1 score (i 5 ), MCC (i 6 ), and FMI (i 7 ),which indicates that the ELUCNN is more reliable and balanced than the other 14 SOTA methods. Besides, all measures of ELUCNN are above 94% except MCC, which shows our method's result can be used in the clinical environment.  Figure 12 shows the explainability of the proposed ELUCNN model. Figure 12a shows the raw COVID-19 image. Figure 12b shows the manual delineation by human radiologists. Figure 12c-f displays heatmaps of four different runs on Fig. 12a via the Grad-CAM method using Z 14 ð Þ, as indicated in Fig. 5c. Remember that Grad-CAM in Fig. 5c can help determine the importance of each neuron at Z 14 ð Þ in our proposed ELUCNN network prediction by considering the gradients (Suri et al. 1482) of the target flowing through ELUCNN.

Explainability of the ELUCNN model
We can observe that the heatmap generated by Grad-CAM and this ELUCNN model accurately capture all the diseased lesion regions. This explainability feature (Kavak et al. 2022) indicates the stability and reliability of our ELUCNN model, which can help radiologists and patients gain more confidence and a deep understanding of our developed ELUCNN model. We can allude that the insights from Grad-CAM, one of the post-hoc explainability methods, can help get rid of the black-box effect in our ELUCNN model. In the future, we shall test our ELUCNN model in other hospitals to further validate its stability and reliability.

Mobile app
MATLAB app designer is used to create professional applications for both desktop and web apps. The input to this web app is any CCT image, and our ELUCNN model is integrated with this developed app. Figure 13a displays  Figure 13b shows the standalone desktop app's graphical user interface (GUI). Figure 13c displays the Screenshot of the web app that is accessed through a Google Chrome (Version: 105.0.5195.125) web browser. The web app is based on a client-server modeled structure, i.e., the user is provided services through an off-site server hosted by a third-party cloud service, viz., Microsoft Azure (Perumal et al. 2022) in this study. Our developed online web app can assist hospital clinicians in making decisions remotely and effectively. The users can upload their custom CCT images, and either the desktop or mobile app can give the diagnosis results by turning the knob into the correct label: COVID-19, HC, or None. Meanwhile, the app will automatically give the heatmap so the users can understand where the lesion is located.

Conclusions
This study first proposes a ten-layer backbone model. Then the traditional CB is transformed into the CELU block by replacing the ReLU activation function with the ELU activation function. Finally, the ELU-based CNN (ELUCNN) for COVID-19 diagnosis is proposed. Besides, the MDA tactic is utilized to enhance the training set. The performances of the proposed ELUCNN are proven to be better than the 14 SOTA models in terms of accuracy. Several questions still remain: (i) The network is not deep enough. (ii) Are there any other NLAFs that can help improve our model? (iii) Transfer learning may help our model. (iv) How can our model be tested in hospitals?
Future studies will try to create even deeper neural networks by adding the 'skip connections' inspired by ResNet. Also, recent NLAFs, such as parametric ELU (Trottier et al. 2017), will be tested. Pretrained models (such as DenseNet, Inception, EfficientNet (Petrini et al. 2022), ShuffleNet, etc.) in transfer learning may be combined together with our ELUCNN model to generate a new ensemble model. Finally, to use our model in reality, we shall test it in more hospitals and distribute the mobile app (See Fig. 13) to the end users in other hospitals. Thus, radiologists can remotely upload their CCT images and get the diagnosis results immediately.
Data availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of interest The authors declare that the research was conducted without any commercial or financial relationships construed as a potential conflict of interest.
Ethical approval This article does not contain any studies with human participants performed by any of the authors. We use open access dataset from Ref. (Zhang 2022a).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.